<?xml version="1.0"?>
<rss version="2.0">
  <channel><title>Quick Hacks</title><link>http://bill.welliver.org//space/pike/Quick Hacks</link><description>&lt;h3 class="heading-1"&gt;Clean up trashed SQLite blobs&lt;p class="paragraph"/&gt;&lt;/h3&gt;
Sometimes, especially when converting data to sqlite, you'll find that fields in a record with binary data are marked internally as text data. What this means is that Pike will try to treat the data as UTF-8 and convert it to a pike string. Because binary data isn't always valid UTF-8. The fix is to re-store the offending data, making sure it's marked as binary, rather than text.&lt;p class="paragraph"/&gt;
The following snippet is an example of how to do this. In our example table, called object_Versions, the offending field is called "contents". Since some of the rows are fine (perhaps those records are storing real text), we only re-store the rows that fail. To find the failing rows, fetch each row individually.&lt;p class="paragraph"/&gt;
You can use the SQLite "CAST" operator to force the data marked as text to be retrieved as binary (BLOB) data. Then, you can put it back.&lt;p class="paragraph"/&gt;
&lt;div class="code"&gt;&lt;pre&gt;&lt;pre&gt;&#xD;
&lt;b&gt;&lt;font color=darkgreen&gt;array &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;ov&lt;/font&gt;&lt;/b&gt; = s-&amp;gt;query(&lt;i&gt;&lt;font color=darkred&gt;"select id from object_versions"&lt;/font&gt;&lt;/i&gt;);&lt;p class="paragraph"/&gt;
&lt;b&gt;&lt;font color=darkblue&gt;foreach&lt;/font&gt;&lt;/b&gt;(ov; ; &lt;b&gt;&lt;font color=darkgreen&gt;mapping &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;v&lt;/font&gt;&lt;/b&gt;)&#xD;
{           &#xD;
  &lt;b&gt;&lt;font color=darkgreen&gt;mixed &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;err&lt;/font&gt;&lt;/b&gt; = &lt;b&gt;&lt;font color=darkblue&gt;catch&lt;/font&gt;&lt;/b&gt;(s-&amp;gt;query(&lt;i&gt;&lt;font color=darkred&gt;"select contents from object_versions where id=:id"&lt;/font&gt;&lt;/i&gt;, (&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;":id"&lt;/font&gt;&lt;/i&gt;:(&lt;b&gt;&lt;font color=darkgreen&gt;int&lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;&lt;/font&gt;&lt;/b&gt;)v&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"object_versions.id"&lt;/font&gt;&lt;/i&gt;]])));            &#xD;
  &lt;b&gt;&lt;font color=darkblue&gt;if&lt;/font&gt;&lt;/b&gt;(err) &lt;font color=red&gt;// ah, a row with a problem. let's fix it...&#xD;
&lt;/font&gt;  {                                                                                                                             &#xD;
    werror(&lt;i&gt;&lt;font color=darkred&gt;"failed to fetch id %d&amp;amp;#110;"&lt;/font&gt;&lt;/i&gt;, (&lt;b&gt;&lt;font color=darkgreen&gt;int&lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;&lt;/font&gt;&lt;/b&gt;)v&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"object_versions.id"&lt;/font&gt;&lt;/i&gt;]);                                                                  &#xD;
    &lt;b&gt;&lt;font color=darkgreen&gt;mixed &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;t&lt;/font&gt;&lt;/b&gt; = s-&amp;gt;query(&lt;i&gt;&lt;font color=darkred&gt;"select CAST(contents as blob) as v from object_versions where id=:id"&lt;/font&gt;&lt;/i&gt;, (&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;":id"&lt;/font&gt;&lt;/i&gt;:(&lt;b&gt;&lt;font color=darkgreen&gt;int&lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;&lt;/font&gt;&lt;/b&gt;)v&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"object_versions.id"&lt;/font&gt;&lt;/i&gt;]]));&#xD;
    s-&amp;gt;query(&lt;i&gt;&lt;font color=darkred&gt;"update object_versions set contents=:contents where id=:id"&lt;/font&gt;&lt;/i&gt;, (&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;":id"&lt;/font&gt;&lt;/i&gt;:(&lt;b&gt;&lt;font color=darkgreen&gt;int&lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;&lt;/font&gt;&lt;/b&gt;)v&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"object_versions.id"&lt;/font&gt;&lt;/i&gt; , &lt;i&gt;&lt;font color=darkred&gt;":contents"&lt;/font&gt;&lt;/i&gt;: t&amp;#91;0]&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"object_versions.v"&lt;/font&gt;&lt;/i&gt;]]));&#xD;
  }&#xD;
}&#xD;
&lt;/pre&gt;&lt;/pre&gt;&lt;/div&gt;&lt;p class="paragraph"/&gt;
&lt;h3 class="heading-1"&gt;Carrot2 Document Clustering&lt;p class="paragraph"/&gt;&lt;/h3&gt;
Interface with &lt;a href="Carrot2" class="wiki_link_external" &gt;http://www.carrot2.org&lt;/a&gt;'s Document Clustering Server via XMLRPC&lt;p class="paragraph"/&gt;
&lt;div class="code"&gt;&lt;pre&gt;&lt;pre&gt;&#xD;
&lt;b&gt;&lt;font color=darkgreen&gt;object &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;x&lt;/font&gt;&lt;/b&gt; = Protocols.XMLRPC.Client(&lt;i&gt;&lt;font color=darkred&gt;"http://localhost:8081/xmlrpc/processor"&lt;/font&gt;&lt;/i&gt;);&lt;p class="paragraph"/&gt;
&lt;font color=red&gt;// input data is an array whose length is a multiple of 4.&#xD;
&lt;/font&gt;&lt;font color=red&gt;// each document input has 4 fields, so document n can be found at&#xD;
&lt;/font&gt;&lt;font color=red&gt;// inputdata&amp;#91;(4*n) .. (4*n) + 3]&#xD;
&lt;/font&gt;&lt;font color=red&gt;//&#xD;
&lt;/font&gt;&lt;font color=red&gt;// all 4 fields are required and are: &#xD;
&lt;/font&gt;&lt;font color=red&gt;// &amp;#91;0]id, &amp;#91;1]url, &amp;#91;2]title, &amp;#91;3]excerpt&#xD;
&lt;/font&gt;&lt;font color=red&gt;//&#xD;
&lt;/font&gt;&lt;font color=red&gt;// the sample data contains 1 input document.&#xD;
&lt;/font&gt;&lt;b&gt;&lt;font color=darkgreen&gt;array &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;inputdata&lt;/font&gt;&lt;/b&gt; = ({&lt;i&gt;&lt;font color=darkred&gt;"id0"&lt;/font&gt;&lt;/i&gt;, &lt;i&gt;&lt;font color=darkred&gt;"http://www.google.com"&lt;/font&gt;&lt;/i&gt;, &lt;i&gt;&lt;font color=darkred&gt;"google"&lt;/font&gt;&lt;/i&gt;, &lt;i&gt;&lt;font color=darkred&gt;"the google search engine"&lt;/font&gt;&lt;/i&gt;});&lt;p class="paragraph"/&gt;
&lt;b&gt;&lt;font color=darkgreen&gt;array &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;clusters&lt;/font&gt;&lt;/b&gt; = x&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"cluster.doCluster"&lt;/font&gt;&lt;/i&gt;](&lt;i&gt;&lt;font color=darkred&gt;"test query"&lt;/font&gt;&lt;/i&gt;, &#xD;
      (&amp;#91;&lt;i&gt;&lt;font color=darkred&gt;"dcs.clusters.only"&lt;/font&gt;&lt;/i&gt;:0]), (&amp;#91;]), inputdata)&amp;#91;0];&lt;p class="paragraph"/&gt;
  &lt;b&gt;&lt;font color=darkblue&gt;foreach&lt;/font&gt;&lt;/b&gt;(clusters;; &lt;b&gt;&lt;font color=darkgreen&gt;mapping &lt;/font&gt;&lt;/b&gt;&lt;b&gt;&lt;font color=darkbrown&gt;cl&lt;/font&gt;&lt;/b&gt;)&#xD;
    write(&lt;i&gt;&lt;font color=darkred&gt;"%s (%d)&amp;amp;#110;"&lt;/font&gt;&lt;/i&gt;, cl-&amp;gt;label, sizeof(cl-&amp;gt;documents));&lt;p class="paragraph"/&gt;
&lt;/pre&gt;&lt;/pre&gt;&lt;/div&gt;
</description><generator>Fins 0.9.7</generator><docs>http://blogs.law.harvard.edu/tech/rss</docs></channel>
</rss>
